Skip to content

Qi Compiler Sync Oct 13 2023

Siddhartha Kasivajhula edited this page Oct 20, 2023 · 6 revisions

I Once Was Blind but Now I See

... or, "The Spectre of Friday the 13th"

Qi Compiler Sync Oct 13 2023

Adjacent meetings: Previous | Up | Next

Summary

We succeeded at integrating Dominik's proof-of-concept providing visibility into compilation stages as macroexpansion steps in the macro stepper, but only after a prolonged tussle with the spectre of Friday the 13th. Dominik also demoed a simple implementation of the threading macro for Chicken Scheme.

Background

Last time, we discussed issues in debugging and testing the Qi compiler, and came up with the classification of the deforestation optimization in terms of stream producers, transformers and consumers. In the interim, Dominik wrote a proof of concept showing how expansion events can be used to expose syntactic transformations to the macro stepper. This time, we started to integrate this into the compiler.

Seeing the Unseen

As we noted recently, following the expansion of a Qi expression in the Macro Stepper shows all of the expansion and compilation steps as a single step, making it hard to debug the compilation process:

(#%app
 (☯ (~>> (map string-upcase) (foldl string-append "I")))
 (list "a" "b" "c"))

→  [Macro transformation]

(#%app
 (let ()
   (qi0->racket
    (thread
     (esc
      (λ (lst)
        ((cstream->list
          (inline-compose1
           [map-cstream-next string-upcase]
           list->cstream-next))
         lst)))
     (#%partial-application
      ((#%host-expression foldl)
       (#%host-expression string-append)
       (#%host-expression "I"))))))
 (list "a" "b" "c"))

Dominik wrote a proof of concept to give us more visibility here, specifically showing the transformation of syntax performed as part of processing any bindings (i.e. the process-bindings compile-time function) in the source expression:

(thread
 (esc
  (λ (lst)
    ((foldl-cstream
      string-append
      "I"
      (inline-compose1
       [map-cstream-next string-upcase]
       list->cstream-next))
     lst))))

→  [Macro transformation]

(let ()
  (qi0->racket
   (thread
    (esc
     (λ (lst)
       ((foldl-cstream
         string-append
         "I"
         (inline-compose1
          [map-cstream-next string-upcase]
          list->cstream-next))
        lst))))))

while executing macro transformer in:
(#%app
 (☯ (~>> (map string-upcase) (foldl string-append "I")))
 (list "a" "b" "c"))

... where we see the "lifted let" that is the result of processing bindings. Of course, in this case there aren't any bindings, but the point is that this shows that individual transformations can be displayed in the macro stepper by use of expansion events, in this case, emit-local-step.

Compiler as Expander?

Recently, we've talked about whether the compiler could simply be implemented as a series of macros. This would be convenient since it would fit into an existing well-supported pattern (macroexpansion), which includes lots of tooling such as the macro stepper, which would be able to show the compilation stages without any further work needed on our part. For instance, we could implement a stage of the compiler like:

(define-syntax (normalize stx)
  (syntax-parse stx
    ...))

... so that we could do (normalize flo-expr) and then pass the result of that to another macro corresponding to the next stage of compilation, e.g. (deforest (normalize flo-expr)). The problem is that macros are expanded outside-in, so this doesn't naively work. We considered whether something like local-apply-transformer might allow this to work, but concluded that that allows us to macroexpand anywhere spatially in the code, but not temporally in terms of order of evaluation.

So it seems that compiler really needs to be a series of compile-time functions, rather than macros. So using the "events" hook into the macroexpander seems to be the way to go as far as gaining visibility into the compilation process.

Macros for your macros

We agreed that adding code to emit events at every step of interest would obscure the code and make it hard to understand, so we decided to write a macro to do it implicitly:

(begin-for-syntax
  (define-syntax-rule (qi-expansion-step name stx0 stx1)
   (let ()
     (emit-local-step stx0 stx1 #:id #'name)
     stx1)))

We wrote another macro for the corresponding binding form:

(begin-for-syntax
  (define-syntax-rule (define-qi-expansion-step (name stx0)
                        body ...)
    (define (name stx0)
      (let ([stx1 (let () body ...)])
        (qi-expansion-step name stx0 stx1)))))

The way the compiler passes work at the moment is, they apply the rewrite rules for that pass repeatedly until the syntax reaches a fixed point. We could either see the before/after at the level of an entire pass, or at the level of individual rewrites. We decided on the latter, so we modified the normalize-rewrite function to use the new macro:

(define-qi-expansion-step (normalize-rewrite stx)
 ...)

We expected this to emit an event each time a normalization rule was applied to an expression. The code compiled and the tests passed, so everything should work, right?

The Spectre of Friday the Thirteenth

One would hope so, but sadly, the spectre of Friday the Thirteenth loomed large and thwarted our hopes. We looked at the original expression through successive stages of expansion in the Macro Stepper, but the output did not show the sequence of normalization steps we were expecting, and we couldn't figure out why.

There's no escape

We first got distracted by the appearance of esc during expansion, which we soon remembered is a rewrite rule in the expander (implemented using syntax-spec), to ensure that unadorned function identifiers are rewritten explicitly to a core language form (i.e. esc) so that it can be reasoned about unambiguously in the compiler:

(syntax-spec
 ...
 (nonterminal/nesting floe (nested)
  ...
  (~> f:id
      #:with spaced-f ((make-interned-syntax-introducer 'qi) #'f)
      #'(esc spaced-f))

(Here, it uses the Qi binding space so that if the function identifier happens to be defined in the Qi binding space, then that takes precedence over a Racket binding, i.e. the "loophole in Qi space". Otherwise, it is equivalent to not using a binding space, as the Racket binding would just be used here).

Ghosts of Departed Patterns

We picked the first normalization rule we saw:

      ;; restorative optimization for "all"
      [((~datum thread) ((~datum amp) onex) (~datum AND))
       #`(esc (give (curry andmap #,(compile-flow #'onex))))]

... and attempted to test it with this expression to see if it would show the normalization steps:

((☯ (~> (>< sqr) AND)) 3)

... only realizing later that this rule was leftover from an early prototyping stage when AND was still in the core language, and isn't applicable anymore. A local townsperson told us, "that core form has been dead for fifteen years!" 👻

We decided to test this rule instead:

   ;; trivial threading form
   [((~datum thread) f)
    #'f]

... using this source expression:

((☯ (~> (~> (~> sqr)))) 3)

But we still didn't see the normalization steps in the macro stepper.

"You're Sure it's Using the Latest Code Right?"

Yeah, of course I'm sure. Look, it's showing these debug logs that we added in the terminal. So why weren't the normalization events being emitted? Well, after wandering the spectre's labyrinth for almost an hour, we realized that everything was using the latest code ... except the Emacs buffer from which we were starting the macro stepper 🤦 .

We re-"ran" the buffer to account for code changes on disk, and we were now seeing individual successive normalizations of the source expression. For instance, for this expression:

((☯ (~> (~> (~> sqr)))) 3)

... we were expecting the wrapping threading forms to be successively removed by normalization. And that's exactly what we saw:

(thread (thread (thread (esc (#%host-expression sqr)))))

→  [Macro transformation]

(thread (thread (esc (#%host-expression sqr))))
(thread (thread (esc (#%host-expression sqr))))

→  [Macro transformation]

(thread (esc (#%host-expression sqr)))
(thread (esc (#%host-expression sqr)))

→  [Macro transformation]

(esc (#%host-expression sqr))

Context is everything

One concern which had been raised by Ryan on Discourse was that using events in this way would not have access to the full context of expansion, and thus would show each transformation in sequence, without knowing that some of these transformations are done as part of expanding a containing expression. Thus, we see the composite transformation of the containing expression shown again after the transformations of component expressions, rather than these components being shown as expanded within the context of the containing expression (as we see with normal macroexpansion in the Macro Stepper).

Many of the events that serve to form this tree-structured context of expansion are private and used only in the Racket source. One way to address the issue would be if the "remark" or "artificial" steps (used in emit-local-step) could provide a way to indicate the context (such as perhaps a parent syntax object? It's unclear if we will have access to the parent syntax though).

Another way could be potentially exposing some of those other macro debugger hooks publicly that are currently private.

Another option could be for syntax-spec to provide APIs resembling define-qi-expansion-step, maybe "define-compile-step", which could manage the generation of events to allow the expander to parse the tree structure of expansion that it does for the regular expansion phase.

Next Steps

Although we had not originally identified this as being among the must-haves for releasing the compiler work, we agreed that having visibility into compiler passes will greatly improve the developer experience and help us complete the work faster, so we will continue to work on this for now.

(Some of these are carried over from last time)

  • Continue to integrate macroexpansion events into compiler rewrite rules to afford visibility in e.g. the Macro Stepper.
  • Provide good error messages in optimized (e.g. deforested) expressions in terms of source expressions.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average upto). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, and sanity tests for the expander.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion.
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
  • Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
  • Come up with a methodology to formalize and theoretically validate compiler behavior.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Sid (featuring special appearances by Ben, Black and White, Ferdinand, Mummy, Teresa, Tomcat, a deer, a squirrel, and of course, the Spectre of Friday the Thirteenth)

Clone this wiki locally