Qi Meeting May 10 2024

Rockin' Refactor

Qi Meeting May 10 2024

Adjacent meetings: Previous | Up | Next [None]

Summary

We reviewed an in-progress refactor that Dominik is doing as part of racket/list deforestation to achieve cleaner separation of individual compiler passes. We also discussed Qi's release practices.

Background

We have three main options for the architecture of the compiler:

Explicit composition of passes in the compiler (the original approach, and still the current approach for code generation)
Hardcoded ordering defined by priority numbers, extensible via a mutable compile-time registry (the proposed and currently implemented architecture)
Explicit composition on the frontend, either by us or by the user (the proposed language composition architecture)

With the third option, Qi itself could compose the passes in the standard way in providing the language, but we could also afford users access to the passes directly in case they wish to compose them in a different order or add passes of their own, for maximum flexibility.

We are currently engaged in refactoring towards proper implementation of (2). Achieving clean separation of compiler passes here should also support (3) if we decide to go that route in the future.

Refactorings

Code generation

We considered defining code generation as a pass to include the actual Qi0->Racket code generation as well as the subsequent bindings transformation, instead of just composing these explicitly.

But after discussing we questioned whether this would be a good step, as this stage of compilation is not reorderable and is also mandatory in today's architecture and must occur at the end. Promoting it as a pass would lend it an interface that we don't necessarily foresee supporting in the future, thus posing a backwards compatibility concern. It could make the code more uniform from the perspective of compiler authors as all stages of compilation would be modeled as "passes," but if the model isn't accurate in terms of the actual role of this stage of compilation, it could represent additional complexity instead.

Modules

Some specific refactorings we talked about:

Rename passes.rkt -> compiler.rkt?
"let.rkt" as an implementation of deforestation does not suggest what it actually does. This should probably be renamed, e.g. loop.rkt to suggest it is a single driver loop.
Modules named "impl" -> "runtime"

Sid assumed the term "runtime" specifically refers to the Chez Scheme runtime which executes compiled Racket code, but we discussed that in fact there is an analogous distinguishable concept of a runtime at each level of compilation. The Qi implementation in Racket defines a Racket runtime for Qi. The Racket implementation in Chez defines a Chez runtime for Racket, and likewise C or assembly for Chez. It is, after all, languages all the way down.

We also discussed that this (Racket) runtime module should be part of the Qi0 code generation stage (whether it is treated as a pass or not).

Infrastructure for Extending Compiler Passes?

Towards keeping compiler passes well abstracted, it might be desirable to have a single require that would help to seed a new compiler pass. But the current compiler passes all have slightly different modules that they need.

We considered constructing a "superset" module requiring all of these interfaces used in any compiler pass and then re-providing them. But we felt that having a single require already does not model the existing state of affairs well, and it's not unlikely that new compiler passes would have distinct requirements, so that every compiler pass would end up requiring more than it needs. So we agreed that such a common module to be required by all compiler passes should instead only contain necessary provisions, that is, those that are used in every pass today. If excluding Qi0 stage from passes would help make this necessary set more clear cut, then that could be one further clue to keep that separate after all.

List deforestation

We discussed how having a stream fusion implementation of deforestation -- even if it ends up being outperformed by the "named let compositing" approach -- is beneficial for another reason, which is that it is the approach used in the industry leading Haskell GHC. With an implementation that is one-to-one with this reference implementation, it gives us a good platform for comparing our approach against a widely used reference in the research literature, giving us more solid ground for publishing results that have relevance to the field.

The parallel approaches we implement for comparison may include continuation-passing style with state tupling / consing, CPS with the set! hack, and named let compositing (i.e. similar to Racket's for forms).

Qi's Release Practices

We discussed our proposed "continuous deployment" release process and what that means in practice. One thing is means is being diligent about maintaining test coverage, so that the pressure of scrutinizing every PR for possible bugs and backwards incompatibilities is offloaded and carried by automated processes we define, freeing contributors to focus on higher-level goals. It's a little more work on an ongoing basis for the authors of changes, but makes for a much more pleasant and unobtrusive development experience for all Qi contributors in the long run.

Modeling User Impact With Tests

We discussed the seemingly high risk of user impact from "releasing" on every commit to the main branch. How can we be sure that we aren't breaking user code when we push our changes?

Our answer is that we define user impact by our tests. Much as the behavior of simple mathematical quantities in the study of physics serves as a model of the physical world, we'd like our tests to serve as a model of Qi use in the real world, a formal certification of behavior in the wild. The tests aren't there merely to catch bugs, but to be a minimal yet representative model of use. Thus, if the tests all pass and they are comprehensive (at least by the minimal measure of coverage), then that constitutes a certificate of release-worthiness.

We felt that if this is really what we seek, then we should consider either incorporating testing standard Qi codebases (such as Frosthaven Manager) as part of our testing infrastructure, or at least ask Ben and other developers to review the current testing infrastructure from the perspective of their applications and certify whether the tests capture cases (abstractly) that manifest in their application (contributing or suggesting missing cases). In this way, users of Qi could participate in and help ensure that our continuous delivery model is reliable. Of course, such participation would be a bonus and an added measure, and should not be considered a necessary ingredient for the reliability of the process.

Reviewing Coverage Tools

We reviewed the current coverage report and found that many of the uncovered lines are in the deforestation module, including many lines that were formerly covered and which now reflect as uncovered in the refactored branch. It could be due to configuration or test paths being missed by Raco, and we will be investigating it. There is also the recent fix for emitting expansion events "breaking out of the sandbox" which wasn't covered. We considered whether this should be covered or whether it would be appropriate to find a way to mark it as reviewed and "ignored" by the coverage checker. But we concluded that as it was an actual issue that was encountered in practice, it should definitely have a test to insure against regression.

We also realized that the coverage report shows covered/uncovered lines in red and green. As Sid is (red/green?) colorblind, out of the box, he cannot distinguish covered from uncovered lines! He usually manually changes the CSS class in the browser to use blue in order to be able to see which lines are uncovered. Dominik pointed out that this is not a very good experience and it should be addressed upstream. We picked some appropriate colors from the colorblind-friendly Wong palette that we used in reporting compiler performance, and agreed to submit a PR.

We also wondered how the Cover tool determines whether lines are covered or not, and how accurate it is. Considering the many big changes to Racket since the tool was authored, including the switch to the Chez Scheme backend, it's conceivable that its accuracy could be affected and it would be good to have some idea of it. So far it seems fairly reliable just empirically, based on the feedback on uncovered lines in the past and the transition to being covered upon writing appropriate tests. But it's worth understanding in more detail over time.

Life Outside Qi

Dominik's band is getting together this weekend for a jam session after a 7 month hiatus!

Sid is feeling guilty about the very delayed release of Symex for Emacs (Dominik happened to notice and was giving Sid a hard time / some encouragement about it :)) and is thinking of giving it some attention this week.

Next Steps

(Some of these are carried over from last time)

Ask Ben to review tests from the perspective of Frosthaven Manager.
Review Cover's methodology for checking coverage.
Decide on a release model, document it in the user docs, and announce the performance regression (and the remedy) to users.
Improve unit testing infrastructure for deforestation.
Discuss and work out Qi's theory of effects and merge the corresponding PR.
Schedule a discussion for the language composition proposal and implement a proof of concept.
Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
Deforest other racket/list APIs via qi/list
Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
Review and merge the fixes for premature termination of compiler passes.
Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly