Skip to content

Qi Compiler Sync Sept 2 2022

Siddhartha Kasivajhula edited this page Dec 15, 2022 · 5 revisions

Achieved End to End Functionality and Compiler Ideas

Qi Compiler Sync Sept 2 2022

Adjacent meetings: Previous | Up | Next

Summary

We picked up where we left off last week and achieved end-to-end functionality with bindingspec integration! We also discussed compiler optimization and implementation ideas at a high level. Since things are working end-to-end now, we should be able to start exploring optimizations. We also talked about the nature of the binding spaces "quirk" that we encountered last time.

Background

Previously, we began incorporating the bindingspec meta DSL to abstract the DSL expander, to outsource Qi's expander and to benefit from formal robustness properties that would give us good error messages for free. When we last left off, the package was building correctly, but there were a few test failures that were caused by syntax properties employed in Qi macros that were not copied over during the expansion by bindingspec.

Even prior to starting the integration with bindingspec, we'd noticed that while Qi was building locally and tests were passing, dependent packages were failing to build.

Another minor issue encountered last time was that Qi's report-syntax-error utility did not correctly report the source line of code.

In the interim, Michael added support for copying over syntax properties, and Sid fixed the report-syntax-error utility.

End to End Functionality

We picked up where we left off and, after pulling bindingspec and ee-lib afresh and rebuilding them, re-ran the tests. There were still failing tests. To debug these we created a separate flow-dummy macro that only expanded the flow, without compiling it, and printed the value of the syntax property of interest. We also wrote a test that invoked it with the failing input. It emerged that bindingspec was propagating syntax properties on core forms but not on production rules at the expander level. This fix was added and that caused the remaining tests for the flow macro to pass!

There were still some failing tests in the Qi interface macros and some of them were fixed by adding the necessary missing imports.

Finally, the interface macro modules (e.g. threading.rkt and switch.rkt, which provide ~> and switch macros at the Racket level) needed to do some internal renaming in order to steer clear of the binding spaces "quirk" we encountered last time (described in more detail below). This also happened to be the reason for the build failure in dependent packages that had been noticed earlier.

With all of those changes, all of the tests were passing, and dependent libraries (e.g. relation) were building correctly as well. But Qi's docs were failing to build, with the message:

: "eval:2:0: ?: access disallowed by code inspector to unexported transformer\n  from module: \"/Users/siddhartha/work/racket/qi/qi-lib/flow/extended/expander.rkt\"\n  at: ~>\n  in: thread.1"
  context...:
   /Applications/Racket-Latest/collects/racket/private/more-scheme.rkt:163:2: select-handler/no-breaks
   /Applications/Racket-Latest/share/pkgs/scribble-lib/scribble/eval.rkt:356:9
   .../private/map.rkt:40:19: loop
   [repeats 1 more time]
   body of "/Users/siddhartha/work/racket/qi/qi-doc/scribblings/intro.scrbl"

We guessed that this may have something to do with the interaction between sandbox evaluators and binding spaces, and fixed it by using call-with-trusted-sandbox-configuration in the creation of the evaluator, as described in this mailing list thread. This works by reducing the number of safeguards provided by the sandbox evaluator. There's no particular reason sandbox safeguards are needed here anyway.

With that, there were no remaining issues!

Binding spaces "quirk"

As mentioned earlier, all of the interface macros that share names with Qi forms (i.e. switch, ~>, ~>>) needed to be renamed internally and renamed-out with their original names, in order to avoid a certain quirk in the implementation of binding spaces. That quirk is explained below.

To understand the issue, it's helpful to think about two different idealized ways in which a binding space could be thought of.

One is to consider them completely independent, so that each identifier has a distinct set of scopes for each space (i.e. the identifier is associated with a tuple of scope sets). In this case, bindings in different spaces could be used together without conflict, but it could cause confusion as it may not be easy for programmers to know which binding is being used in each case.

The other way is to consider that bindings in different spaces have associated meanings, which could be modeled as a common set of scopes (i.e. a single scope set for each identifier) used across all binding spaces, with an additional mapping from (binding, binding space) to a particular meaning. For example, this supports considering Racket's and as being in some way related to Qi's and or another and in a different language like Minikanren. In this case, if the same binding were imported from more than one space, the expander could signal an error that the binding is ambiguous without further qualification.

Racket's implementation is to treat binding spaces as simply another scope, taking advantage of the pre-existing "interned syntax introducer" which allows us to define bindings that have the same scopes across modules (in contrast to the typical case where bindings have scopes that distinguish whether they are imported from another module or defined locally).

This approach minimizes the number of components that need to be modified (allowing it to be implemented in Racket at module import/export level without requiring changes to other parts of the binding table), but on the other hand represents an awkward combination of behaviors in relation to the other two approaches. Using bindings with the same name in different spaces is unhandled, and as a result, in such cases, the error falls through the cracks and the resulting error message is not especially helpful. Additionally, in those cases where a confused binding, by coincidence, doesn't cause a syntax error, it may result in unexpected runtime behavior. For example, these are some observed error messages:

switch: bad syntax
switch: expected more terms

In these cases, the Racket macro switch bound an identifier switch used within a Qi flow, and attempting to expand the Qi flow using the Racket macro failed with the above errors. In one of the two idealized approaches, it would either not have caused an error at all, or would have specifically indicated the ambiguity in resolving the binding in consideration of the different spaces.

It will be useful to keep this quirk in mind as we work with binding spaces. For now, in such cases, the best workaround is to either rename-in the imported identifiers from the (e.g. qi) binding space, or otherwise rename-out the Racket identifiers from the defining module.

Compiler Implementation

  • Is syntax-parse the right tool to use in producing intermediate stages of compilation (IRs)? Yes, could use syntax literals to identify new names. Could also explore the nanopass library or using struct types.
  • What would the IRs look like? Should each pass start and end from the same place so that they are easily composable? Typically, could be convenient to use subsets or supersets of Qi, or Qi + extra syntax properties. We might also have a restricted language that syntactically enforces an invariant that we'd like to enforce.

Compiler Optimizations

Some high-level optimization strategies were discussed.

  • Qi could disallow "accidental" side effects, that is, providing no guarantees regarding side effects except for forms explicitly wrapped within an effect or esc form. This could open up a lot of optimization possibilities, but it is somewhat drastic.
  • Some basic things like cons and list are considered "effectful" in Racket, since it is a procedural rather than purely-functional language. As a result, shedding some of these guarantees (e.g. Racket guarantees that (cons 1 2) is not eq? to a separately-written (cons 1 2)) could allow us to treat them as effect-free as other functional languages like Haskell do.
  • Avoid constructing intermediate collections, like Clojure's transducers. This is called "deforestation" in the formal literature. For instance, (~> (>< f) (pass g)) could be rewritten to (~> (>< (if g f ⏚))), and even operations on canonical data structures like lists could be optimized, for instance expressions like (~>> (map f) (filter g) (foldl op v i)) should avoid intermediate representations as well. Note that Racket cannot do such optimizations because cons and list are considered effectful (as described earlier).
  • Index-based optimizations: If we consider an idealization where Qi has a small constant number of values flowing, we can imagine several index-related optimizations. For instance, flipping the order of inputs could be implemented as (lambda (a b) (values b a)) instead of computing a reversal of the inputs. Likewise, selecting the 3rd input could look like (lambda (a b c d e) c). This idealization may suggest optimizations that may be used in certain cases encountered in practice.
  • Restorative/one-off optimizations: For cases where reduction to a core language resulted in loss of performance, we could individually identify them by an optimization rule to "restore" their original implementation. In general, such one-off cases should be avoided since simply undoing the effect on its own isn't worth it, unless we are also able to leverage more generally applicable optimizations by virtue of the reduction to the smaller core language. (related: "peephole" optimizations that may transform e.g. (* 2) into a bitshift operator).
  • Phrase detection: Qi exhibits phrase structure. It may be useful to detect common phrases and map them to optimized versions.
  • Identities: Simplify expressions using Qi identities.
  • Would using an internal data structure, similar to APL's multidimensional array, improve performance? Probably not, since Qi is used with normal functions, and these would need translation from the data structure to values or lists and back again.
  • Arity modeling as an intermediate language in the compilation.

Next Steps

  1. Try out the latest code (that includes the bindingspec-produced expander). Things to observe:
  • quality of error messages
  • compilation times for the package (is there a difference from before?)
  • testing times (is there a difference from before?)
  1. Remove all the debugging code and tidy up the branch.

References

Attendees

Michael, Sid

Clone this wiki locally