Skip to content

Qi Meeting Mar 15 2024

Siddhartha Kasivajhula edited this page Mar 26, 2024 · 2 revisions

It's Syntax All the Way Up

Qi Meeting Mar 15 2024

Adjacent meetings: Previous | Up | Next

Summary

We continued discussing ways to determine the culpable syntax when a compile time error occurs, and didn't find any good answers. We also discussed progress on the deforestation of racket/list APIs, which has coincidentally also run afoul of some nontrivial blame issues.

Background

Last week, we started to explore some approaches to preserving the source syntax through expansion so that we can produce appropriate error messages in the compiler. It proved more tricky than expected.

Whodunnit?

By the time the compiler is performing optimizations, the Qi surface syntax that the user wrote has already been expanded to the more austere core Qi language. But when an error occurs at this stage, we don't want to show the user an error in terms of this core language, but instead, obviously (we felt), in terms of the source syntax that the user wrote.

Take this example. If the user wrote this:

(~>> (filter odd?) (map sqr))

… that expands to this:

'(thread (#%blanket-template ((#%host-expression filter) (#%host-expression odd?) __)) (#%blanket-template ((#%host-expression map) (#%host-expression sqr) __)))

If something went wrong in compiling this, it would be ridiculous if we showed them this core language expression as they wouldn't be able to make head nor tail of it. So, of course, we should show them the first expression!

When we released Qi 4 a few weeks ago, since we didn't actually have the source syntax available during compilation, we had to implement a ✨ "de-expander" ✨ to reconstruct the source syntax from the target expression for use in the error message. But this approach is brittle and just a hack. We were interested in a more robust option that could perhaps be provided by Syntax Spec.

We Know Whom to Blame!

But what if the user had written this:

(define-qi-syntax-rule (m)
  (~>> (filter odd?) (map sqr)))

(~> ((list 1 2 3)) (m))

Here, the ultimate source expression is simply (m) or (~> ((list 1 2 3)) (m)). But blaming either of these doesn't seem the most helpful. We'd like to blame (~>> (filter odd?) (map sqr)) again, as we did earlier.

There's a Lot of Blame Gettin' Thrown Around Here

Now what about if the user had written this:

(define-qi-syntax-rule (n)
  (filter odd?))

(define-qi-syntax-rule (p)
  (map sqr))

(define-qi-syntax-rule (m)
  (~>> (n) (p)))

(~> ((list 1 2 3)) (m))

Should we blame (~> ((list 1 2 3)) (m)), or (~>> (n) (p)), or (map sqr), or (filter odd?)?

Uh, we're overcomplicating it. Can't we just blame (~>> (filter odd?) (map sqr)) somehow? Obviously, we should expand the expression until it reaches the expression we know we want to blame! Jeez.

Well, for that matter, ~>> itself is a macro and isn't part of the Qi core language, even though it's part of the surface Qi language included in the standard distribution (it's defined in qi-lib/flow/extended/forms.rkt). It's built-in, but it's still a macro.

So how do we even know to blame (~>> (filter odd?) (map sqr)) rather than, say, (~> (filter odd? __) (map sqr __)) which is its immediate expansion?

OK fine, let's blame that.

But this expression is just another intermediate stage. How do we identify this as the right stopping point in the blame game, rather than, say, the next expansion, (#%blanket-template (~> (filter odd? __) (map sqr __)))?

😑

The more we talked about it, the less obvious it became who we should consider the culpable party and how we can identify them. We all agreed that we could rule out blaming the core language and the compiled target language. But who then? Should it be the actual source syntax the user wrote? Or perhaps a macro that they used? Or a built-in macro used by that macro? There are macros upon macros. How do we know where to stop? Which syntax would we like Syntax Spec to preserve through expansion?

Propagating Source Locations

We reluctantly arrived at another option that we've talked about before and dismissed – manually propagating source locations through expansion and compilation through the use of syntax/loc instead of the usual #' (short for syntax).

This would mean that we would be able to implicate the last expression written by the user, prior to any expansion by Qi.

TODO: In the last example above, which syntax would this allow us to blame?

We're reluctant to pursue this since it might involve a lot of changes in the code as we use #' everywhere. It could lead to subtle bugs in case we neglect to do this in any instance, and could make the work of maintaining the code a little more tedious than we'd like.

Do We Know Whom to Blame?

In the end, it seems that we really need to be able to clearly and precisely define whom we want to blame. If we are able to do this, then there is a chance that Syntax Spec will be able to preserve that syntax. Otherwise, the options we are left with aren't great:

  • Retain the ✨ de-expander ✨
  • Don't blame anyone, and let the error implicate the core language expression or fall through as a runtime error
  • Propagate source locations through use of syntax/loc

The first is brittle and bug-prone. The second leads to poor and unhelpful error messages. The third seems like tedious overhead on the developer experience, not just now but always.

We could consider blaming the initial source expression in every case. But as we saw earlier, this may often not be very helpful. It may still be better than either of our existing bad options, however. But is it good enough to be worth implementing?

Explicitly Demarcate a Language Boundary?

Another option could be to attach an "internal" syntax property during the expansion of every form included in the standard Qi distribution, including both core forms and macros. Essentially, this would allow us to distinguish the standard Qi language, including built-in macros, from user-defined macro extensions. Then, we could potentially preserve the first syntax encountered during expansion for which all component syntax has the boundary property attached. This should be the first syntax that is entirely in the built-in Qi surface language. It may be tricky to implement since macros are expanded "outside-in," and the syntax may already be a fair way to the core language before the stop condition is met, but there just might be a way to synthesize the expected "boundary" syntax from intermediate syntax objects by tracking the appearance of the boundary property on each component during expansion.

Runtime vs Compile Time

More generally, we considered how, when something goes wrong at runtime, we have a stack trace available (or more generally, continuation frames) which contains the context of execution, including the sequence of evaluation steps such as function invocations.

We don't have something like that for syntax expansion when something goes wrong at compile time, since intermediate syntax is discarded during expansion, for memory consumption reasons. If we could reliably identify a source expression we are interested in preserving, then that would give us a start on preserving it in Syntax Spec for DSLs (even if not for Racket in general), but we don't have anything concrete here yet.

In some ways, the Macro Stepper gives us the equivalent of this, but of course, this isn't something we can use in error messages but is a different kind of, perhaps alternative, tool. Even on that front though, DSLs don't have the same level of support in the Macro Stepper as Racket itself, so this isn't an especially viable recourse for us at this time.

Progress on Deforesting racket/list

Dominik has begun deforesting more racket/list APIs. He mentioned that while placing contracts on stream producers to manage blame was a reasonable solution in our initial implementation, it does not appear to be the right approach for consumers and will require some thought to identify what would be appropriate. It will likely be informed by our continuing discussions on the syntax "blame game."

We also discussed whether simply doing (require qi) should include any list deforestation out of the box (as it formerly did) or whether we should expect an explicit (require qi/list). We could potentially even consider mirroring racket and racket/base here with qi and qi/base. We agreed that it was too early to know for sure on this, as the decision will be informed by the actual implementation, which we are still in the early stages of. We also considered whether the qi/list collection should be included in qi-lib or as a separate package, qi-list, that would be included in the composite qi package that's typically installed using raco pkg install qi (following the composable packages best practice in Racket). We felt the latter would be preferable, but it's still a bit early to know what this will look like.

Next Steps

(Some of these are carried over from last time)

  • Merge "docs arrears" PR containing documentation related to Qi 4, including effect locality, etc.
  • Review language composition proposal and implement a proof of concept.
  • Decide on appropriate reference implementations to use for comparison in new benchmarks report and add them.
  • Deforest other racket/list APIs via qi/list
  • Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
  • Review whether we can continue optimizing a single syntax node at many levels
  • Preserve the source syntax through expansion in Syntax Spec.

Attendees

Dominik, Michael, Sid

Clone this wiki locally