Qi Compiler Sync Jan 5 2024

Returning to the Surface

Qi Compiler Sync Jan 5 2024

Adjacent meetings: Previous | Up | Next

Summary

We discussed remaining items and blockers for next week's planned release, and did some final checks to verify that we're on track. We also discussed a proof-of-concept for a simple compiler extension mechanism that could be used to deforest racket/list while remaining external to the Qi library.

Background

We're almost there!

A Simple Compiler Extension Mechanism

We've talked before about ways to extend the compiler that would allow us to support something like this:

(require qi qi/list)

... and this should result in all racket/list APIs being deforested when used in a Qi context.

Dominik showed a set of simple modules requiring one another in the appropriate configuration (and connected by use of a shared mutable value at compile time? TODO) so that the above could be achieved.

One potential problem with this approach is that if there are two libraries A and B that both extend the Qi compiler, then they would independently test Qi + A and Qi + B, but the interaction between the two would remain unmodeled and their use together, Qi + A,B, would exhibit undefined and perhaps unexpected behavior.

Members only!

We agreed that this would not be a problem in practice since we would consider this mechanism to be "internal" to Qi developers rather than a publicly supported extension mechanism. Of course, they say that any bug in production will eventually become a feature as users build applications around it, so there is still the possibility. It's not unlikely that tomorrow, someone may see an opportunity to deforest their pet data structure using the same "internal" approach, and before we know it, these interactions could get murky. Not to mention, even for internal developers, it may prove challenging to test these interactions in a clean way.

Typed Racket compile-time effects?

We wondered whether this approach could run into any problems and whether we should consult with Typed Racket developers for any concerns, as they have done similar things. Michael mentioned that Typed Racket does way more complicated things with compile-time effects and so we should be OK here.

Other benefits

This approach also allows us to benchmark the compiler with/without certain indicated passes, as each pass can be turned on and off using this approach.

Decision

In any case, we agreed that for the immediate goal of (1) deforesting racket/list while (2) keeping this functionality outside the core Qi library to keep it lean and general, this approach would work!

More Nonterminal Blues?

Recently, Syntax Spec added a new nonterminal syntax property to assist with syntax tree traversal for the purposes of applying compiler optimizations in each pass. We realized that though this property is attached by the expander (implemented in Syntax Spec) and tests (where we attach it indiscriminately), yet, if we synthesize any syntax in compiler passes it won't be propagated. This won't miscompile anything but there's a chance it could miss optimizations.

We reviewed the actual normalizations and found that it would only miss an optimization if there was a synthesized flow in a template that contained a subflow that would have matched an optimization. For instance, in this normalization rule:

    [(thread _0 ... (pass f) (amp g) _1 ...)
     #'(thread _0 ... (amp (if f g ground)) _1 ...)]

... it would fail to propagate the nonterminal property to (amp (if f g ground)).

We didn't find any instances of this in the current rules (in this specific rule, there are no further optimizations that would apply to the amp form synthesized here, either in this pass or in subsequent passes like deforestation), so we concluded that there is no impact from this issue at this time, and it is not a blocker for release. We agreed that a more robust solution would be needed going forward, however, for instance, a syntax tree traversal utility that is inferred by Syntax Spec from the core language grammar.

Marshalling Arts

Though the addition of the nonterminal syntax property fixed a recent bug we'd encountered, it revealed another problem. Last time, we decided it wasn't important to fix it since it only affected older BC versions of Racket, but in subsequent discussion we felt that it should be easy to fix and also that it might be a clue to a real issue or of something we could profitably address.

We looked into this issue, and found that it was caused by embedding a srcloc object in generated syntax. The Racket docs do say that this should be OK, and it was last updated only on version 7.0. But of course, that's not what we were seeing and it was showing an error on versions as recent as 8.5-8.8. It seemed that there was likely a bug that was fixed in version 8.9. Ben asked a question on Discourse to the effect to report the issue so that it could be documented as needed. We avoided the issue by converting the srcloc into a vector using build-source-location-vector.

Recovering Source Syntax

We recently discussed relying on Syntax Spec to preserve source syntax through expansion, so that it can be used to implicate the right expression in the event of errors, and had initially decided that offering the stack of the full expression history through expansion and compilation would be feasible. We revisited this to understand what purpose we need the source syntax for:

For blame purposes, in error messages?
For debugging purposes, to identify faulty transformations

We agreed that for the former, preserving only the original source syntax would be sufficient, while for the latter, the Macro Stepper is already meant for this purpose. If there was another reason, Michael would consider designing the interface with the full stack available, but in the absence of one, it would make more sense to just preserve the original surface syntax, for memory and other reasons. We agreed that this was sufficient for our purposes.

Techniques for better debugging

It came up that the Beginning Student Languages add extra continuation marks to provide additional error management hooks, as well as other techniques that may be useful to study to improve our debugging infrastructure.

Code Review

We agreed that now is a good time to request code review from some Racket core contributors who are adjacent to the Qi project.

Qi Forms are Still Unlinked in the Docs

Formerly (i.e. prior to the compiler work), Qi forms were datum literals, meaning they weren't actual defined identifiers but purely patterns to be matched in the flow macro. This meant that they didn't need to be exported by the library, and consequently, they also weren't linkable in Scribble documentation the way that Racket identifiers typically are.

But with the change to using literals, we assumed that these would now be magically linked as part of (require (for-label qi)). This wasn't the case, however. We considered that it might be because these literals are defined in the qi binding space rather than in the default space. We decided to try (for-label (for-space #f qi)) after the meeting, to "flatten" the spaces into the default binding space, to see if that might do it.

Apples to Apples

Now that we have a robust and rigorous benchmarking suite in the works, we considered what the results were actually telling us. Yes, Qi was a factor of N times faster than Racket for "equivalent" functional pipeline implementations, but there are Racket for forms (for which Sam Tobin-Hochstadt provided us implementations of our benchmarks) which perform just as well or even faster. What's the right way to interpret these results?

Of course, as Qi compiles to Racket, it could never truly exceed Racket performance, and hand-written recursions or iterations would presumably always be an upper bound on performance. On the other hand, there are many different ways to write Racket code that does the same thing, and some ways are faster than other ways, including pathological ways that may be very fast or very slow for particular cases. Which of these implementations are appropriate for comparison?

The question really becomes one of comparing "equivalent" implementations, and for our purposes, we are thinking of "functional" implementations.

Now, even "functional" isn't well defined and so it's not easy to agree on what counts as functional. While for forms have been criticized in the past as being non-functional due to their encouragement of and reliance on mutation, that's not actually the case with Racket's for/list forms, which do not support arbitrary iteration but specifically construct a list. They are as "functional" as filter, map and foldl in this sense of immutability. But another sense of programming in the functional style is that of abstraction of the mechanics of the computation using higher order functions and whole collections. for/list does not meet this requirement since its use involves actually applying these functions to individual arguments chosen from collections. So it's a matter of some subjectivity whether this could be considered "equivalent."

Finally, the for forms attain their best performance depending on which form you use, and there are a great many of them, including for/list, for/sum and for/first, each specialized to some case. Additionally, the best performance is attained when using sequences derived from the input lists (e.g. in-list), and not the input lists directly. In contrast, deforestation is a general approach that applies to any composition of higher-order functions like map, filter, and foldl, which further supports that the for forms are not "equivalent" implementations to Qi's deforested functional pipelines.

We agreed that different comparisons will be relevant for different audiences and different purposes. What Qi's deforestation optimization accomplishes is the optimization of functional pipelines involving filter, map and other such standard higher order functions. Against Racket's use of the same functions, Qi is indeed much faster. At the same time, comparison against for forms and hand-written recursions would show Qi's performance against other ways we could implement the same algorithms in idiomatic Racket, and could reveal how much "headroom" there is for further improvement. In such comparisons in preliminary data, we saw that Qi is almost on par with Racket's for/list when using in-list (i.e. the fastest version).

Where to House the New Benchmarking Suite?

Although we had recently concluded that the new benchmarking suite should be added to the Qi SDK while it is still very specialized, and to aid in gathering data within our existing workflows, we reviewed that and agreed that the benchmarking suite should remain as a separate library for now, and that we will gather data "extrinsically," without hooking into the Qi repo's existing infrastructure for benchmarking, since doing that right would involve some choices (such as selecting appropriate triggers on updating the benchmarks) that may take time to work out.

Qi 4 in 25 Words

Stephen had recently asked if we could summarize the new release in 25 words or less, to be able to explain what it's all about to someone who may not be familiar. We discussed some themes that might help us come up with something:

Functional programming
Good performance of idiomatic code
Flow-oriented thinking about algorithms
Data going through pipeline and being transformed
Flow-oriented programming
Extension language on existing programs (e.g. small Qi program to perform FFT)
Linear vs branching (compared to threading)

Next Steps

(Some of these are carried over from last time)

Incorporate the new benchmark suite into the SDK (including Makefile targets) and CI workflows
Choose representative benchmarks to use in the report
Update the main documentation to reflect the compiler work
Review whether we can continue optimizing a single syntax node at many levels (this is not a blocker)
Comprehensively review error messages in Qi
Add test cases for the backwards incompatibility on datum vs literal matching.
Investigate ways to extend the compiler (e.g. language composition or compiler macros) and use that to implement qi/list (deforesting racket/list) as a proof of concept.
Prototype and then propose map1 for addition to Racket.
Preserve the source syntax through expansion in Syntax Spec.
Review the use of contracts in deforestation.
Improve expander tests to compare expressions literally, using a test for alpha-equivalence.
Try unpublished expansion event APIs to implement hierarchical reporting of expansion in the compiler.
Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
Find a way to reliably detect list-like operations on values.
Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
Come up with a methodology to formalize and theoretically validate compiler behavior.
Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Ben, Dominik, Michael, Sam, Sid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly