Skip to content

Qi Meeting Mar 22 2024

Siddhartha Kasivajhula edited this page Apr 5, 2024 · 2 revisions

Looking Inside the Black Box

Qi Meeting Mar 22 2024

Adjacent meetings: Previous | Up | Next

Summary

We looked at a few recent improvements to the Qi repo infrastructure, docs, and wiki. We continued the discussion from last time on how to generate good error messages and blame the "right" source syntax, and identified some avenues for further exploration. We reviewed an in-progress PR tweaking some of our compiler strategies as discussed in previous meetings. We brought up some longer-term research directions such as language composition and the theory of effects, and the need to schedule dedicated discussions for them when interested parties are available, to move them forward.

Background

A PR updating the docs to reflect the changes in Qi 4 was merged this week, and it adds the new Qi logo to the official docs!

Qi Docs

We recently merged per-commit rigorous benchmarking (using Dominik's vlibench package), so that we now have both current benchmarks as well as performance trends that are generated on every commit! Both reports are easily accessible via badges in the repo README.

Qi Repo

We've added a red Qi logo to the developer documentation in the Qi wiki. We joked that it's red because if you just want to use Qi and have fun, then you can "take the blue pill" on the regular user documentation, but if you "take the red pill," we show you how deep the rabbit hole of Qi development goes. 😆

Qi Wiki

(The real reason is that blue and red are the colors used in the official Racket logo ... or is it? 🤔)

Speaking of rabbit holes, Dominik (how did you know I was going to say Dominik?) is working on supporting more consumers in racket/list APIs and everything turned out to be super straightforward and easy. No, come on. Of course, Dominik is deep in another rabbit hole, exploring the low level, subterranean caverns of Racket's contract system on a quest to directly create and manipulate blame objects.

In the leadup to the Qi 4 release, we had suspected that compiler optimization passes may prematurely terminate in some cases. We had pinpointed the issue about that time as well. Last time, we discovered that deforestation was also terminating prematurely in a certain case, and so Sid created a PR to address these.

In recent discussions with Ben and Michael (as well as earlier discussions with everyone in meetings), it emerged that characterizing Qi's theory of effects precisely would be necessary both to designing optimizations in the future (i.e. as part of Qi's theory of optimization), but also for explaining to users precisely what the guarantees are, in order to encourage appropriate style and conventions, so Sid started a PR to address this.

Michael has some research deadlines coming up, so he stopped by briefly just to say hi!

Opening the Black Box and Closing it Again

Some time ago, we saw how a Racket expression like:

(map sqr (filter odd? (range 5 10 3 1)))

… does not provide source context in the resulting error message, whereas an analogous Qi expression does:

(~>> (3 1) (range 5 10) (filter odd?) (map sqr))

This is achieved by using the low-level contract API with producers like range in Qi's implementation of stream fusion optimization.

We were also able to use the same approach for car, which is a consumer.

But it has emerged that even the low-level API can't easily be used for generating appropriate error messages for consumers in general, for instance in deforesting list-ref.

Instead, it may be necessary to directly create blame objects with the appropriate "polarity." In order to understand this, Dominik is attempting to implement the functionality of the low level contract API using the blame primitives. It's analogous to how one might gain insight into generators and coroutines by implementing them using the primitive control construct of continuations. Likewise, Dominik hopes to understand blame objects and contracts better through this exercise. So far, though, there are a few difficulties:

  • While there are many public APIs to access and manipulate blame objects, there is no public constructor to create them.
  • By using the private constructor, so far, we haven't been able to reproduce the behavior of contracts defined using contract.

We will continue playing around with this and see where it leads.

Catching Sight of the Goal

Although the details are obscured at this time, of all the standard exception types provided by Racket, the only one that contains source location information is exn:fail:contract:blame-object. So it seems reasonably safe to assume that we will need to use this somehow in our error reporting, when all is said and done.

Designing Error Messages

We discussed that it would be useful to study codebases using Qi, such as Frosthaven Manager, Qi Cat, and Qi Circuit, to get a sense of what kinds of context in error messages would be useful to users. For instance, should we only include the culpable syntax, or also include some of what came before and after it?

We also felt that it would be interesting to see what other libraries, like Laurent Orseau's define2 library, do in their quest to provide better error messages for standard Racket forms, and whether we can learn anything from their approaches.

You'll Miss Me When I'm Gone

The Much Maligned De-Expander

Although we always only speak of the ✨ de-expander ✨ by surrounding it with magical sparkles to indicate at once both its cleverness and its ridiculousness, after extensively discussing alternatives to it these last few weeks, we've gradually come to appreciate what the humble de-expander does for us.

First, of course, we have always been saying things like:

  • It's brittle and hacky.
  • It's independent of our actual core language specification instead of derived from it, so will require independent maintenance.
  • It is attempting to recover (and cannot always recover) information that the expander already knew at one point.

Such nasty things to say! Really, we didn't mean any of it, de-expander, we promise! 😅

Because, when all's said and done, here's what can be said for it:

  • It works.
  • It is of low complexity.
  • It produces reasonably good surface syntax for blame purposes.

And ultimately, when you consider the two approaches of:

  • A. Building elaborate syntax tracking into our expansion and compilation pipeline to be able to identify the right syntax to blame
  • B. Doing a small amount of processing on core syntax to infer a usable source syntax at the time when an error needs to be generated

It's clear that the former would be a much more significant undertaking, and we ought to have a compelling reason to pursue it.

So after all this discussion, it would seem that the de-expander's challenge to us now is, "Can you come up with an alternative that either doesn't involve a lot more work, or which gets you much better results? [I rest my case.]"

Yes yes, de-expander. We hear you, and you're right to push us to higher standards.

Indeed, we agreed that if we are going to consider an alternative to the de-expander at this point, it should either be less complex (unlikely) or, if it is going to produce similar results, it should involve little work (unlikely), or otherwise, it should provide us much better results.

With these high standards in mind, we reviewed our options.

No One to Blame

The first "robust" alternative that we had hoped for was for Syntax Spec to preserve the surface syntax prior to expansion, so that we could implicate it in error messages generated in the compiler.

As we learned last time, it's not so clear which surface syntax to blame. It's syntax all the way up, from core Qi syntax to built-in Qi macros to towers of user-defined macros. It's not easy to know where the buck stops in the blame game.

So far, we haven't gotten anywhere with this option.

syntax/loc

This allows us to retain the syntax entered by the user as the "context" of macro expansion through the various stages of expansion. Sam (TH) mentioned on Discord that this is the approach used by Racket itself, so it definitely seems like something we should explore further. Some concerns we still have at this time are:

  • It would require a lot of changes in the codebase.
  • It would contribute maintenance overhead over time for developers to remember to include or exclude it on a case-by-case basis.
  • It's unclear whether the use of binding spaces in Qi macros may present some difficulties in propagating context.

Finally, although it might allow us to implicate a different surface syntax object than the one produced by the de-expander, it's unclear at this time whether that implicated object would be a more helpful one.

To play de-expander's advocate, this could be a lot of work to get us what the de-expander already gives us. But on the other hand, we don't understand this option well enough yet, and so we aim to explore it further to understand the potential value of doing it.

Demarcating the Language Boundary

This option involves attaching a boundary syntax property during expansion of all built-in Qi forms, including macros, thus identifying the precise point during expansion of surface syntax that a component of syntax enters the "Standard Qi" language boundary. This could be done by defining a macro analogous to define-qi-syntax-rule that's for internal use within the Qi codebase. This macro would be the same, except that it would also attach the boundary property to the expansion. In addition to this macro, Syntax Spec itself would need to attach this property if needed when applying ad hoc expansion rules like converting infix template syntax to prefixed core form syntax (e.g. (filter odd? __) → (#%blanket-template (filter odd? __))).

If we could leverage this boundary information to synthesize a syntax object that represents how the source expression would look if it were expanded "only up to" the language boundary, that synthesized syntax might prove to be a useful syntax object to blame.

For instance, for any of these Qi source expressions:

(~>> (filter odd?) (map sqr))
(define-qi-syntax-rule (m)
  (~>> (filter odd?) (map sqr)))

(~> ((list 1 2 3)) (m))
(define-qi-syntax-rule (n)
  (filter odd?))

(define-qi-syntax-rule (p)
  (map sqr))

(define-qi-syntax-rule (m)
  (~>> (n) (p)))

(~> ((list 1 2 3)) (m))

… it would be ideal if we could synthesize and then blame:

(~>> (filter odd?) (map sqr))

We did not immediately come up with an algorithm that would allow us to use such a boundary property to synthesize this object, however, so at present it's unclear whether it will help. We agreed it could be worth implementing if indeed it provides us this result, and not, say, something more comparable to what the de-expander already gives us, which for reference would be (~> (filter odd? __) (map sqr __)).

Back Where We Started?

There are a couple of options identified above that are worth exploring further, but the de-expander isn't going anywhere for the moment.

Tweaking Compiler Strategies

Last time, we discovered the following case where it would be necessary to apply the deforestation optimization to a fixed point in order to fully deforest the expression:

(~>> (filter odd?) (map sqr) (foldr + 0) range (filter odd?) (map sqr))

In applying the optimization, the compiler first matches the sequence filter … foldr and deforests it. But the sequence range … map is also deforestable, and without applying the optimization to the expression (for it is a single expression) once again, it would be missed.

We had also already noted another case of premature termination of compiler passes in the past. This time we reviewed a PR to address these. We agreed that it would be good to:

  1. Add a test to validate the above case.
  2. As syntax traversal (i.e. find-and-map) is used both in the implementation of optimizations as well as bindings, add dedicated tests for bindings to ensure that the changes don't break anything there.

A Glimpse Into the Future

We've considered supporting alternative semantics for flows before, where there could be dialects of Qi that share the same surface language, but, under the hood, execute each flow differently, for instance in threads or processes rather than simple functions. But in recent discussions we've also felt that such execution strategies could be employed in the Qi compiler to improve performance of the existing Qi language. In particular, the tee junction, -<, naturally has no interacting dependencies between its tines, so it seems that implementing each tine using a future could be a seamless way to improve performance.

Yet, as Ben pointed out on Discord, futures can make reasoning about performance challenging in some cases, so it would be ideal to be able to opt into or out of their use on demand. And where does this fit in with the broader discussion on alternative semantics? Is this a complementary line of exploration or is it the same line of exploration?

It does seem that these two approaches are ultimately one and the same. After all, if we can solve the broader problem of imbuing a common surface language with alternative semantics on demand, then that would entail the ability to override core Qi implementations with implementations using futures (among other things). On the other hand, overriding core functional implementations with futures directly in the compiler today would preclude this neat composition.

Now that the modular compiler architecture has been merged, we are already in a position to leverage the common Qi expander to elaborate the surface language, and require extensions on-demand by simple use of require, for instance, (require qi/list) to gain list-specific optimizations like stream fusion. This is a step in the direction of the goals we had outlined for supporting alternative semantics for Qi.

But without introducing additional complexity (or "complications" as they say in watchmaking), we cannot use this current extension approach to provide different implementations for anything that is in the Qi core language. That is, we could implement core forms like -< using functions, or futures, or threads, or some combination based on conditional assessment of the source code. But we must do exactly one of these, and overriding it using a simple extension mechanism would involve cross-module interaction at compile time that could get messy.

The approach of formally composing languages, which we've talked about at various times, could provide us a cleaner way to do this, since with compilation passes modeled as languages that commute, and by users explicitly defining the composition, we can simply leave out or swap out any compilation stage, do them in any order, and even replace code generation with a different backend. It does not require any special handling beyond the core approach that we've already outlined and discussed at a high level, and could allow us to achieve our stated goal of being able to use a common surface language, ordinary Qi macros, and a common expander, while compiling the language with optimizations on demand, and to any number of different backends, in a well-defined and scalable way. The specifics of this approach remain to be worked out and it's likely to be a topic of discussion in the coming weeks.

In any case, exploring alternative backends is an increasingly promising avenue of research. As it happens, it is also closely related to our recent discussions about Qi's theory of effects. After all, how can all of these languages provide different semantics and yet still be "Qi?" The theory of effects will provide the answer here.

Next Steps

(Some of these are carried over from last time)

  • Schedule a discussion for Qi's theory of effects and merge the corresponding PR.
  • Schedule a discussion for the language composition proposal and implement a proof of concept.
  • Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
  • Deforest other racket/list APIs via qi/list
  • Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
  • Review and merge the fixes for premature termination of compiler passes.
  • Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael (briefly), Sid

Clone this wiki locally