-
Notifications
You must be signed in to change notification settings - Fork 12
Qi Meeting Mar 29 2024
Qi Meeting Mar 29 2024
Adjacent meetings: Previous | Up | Next
We looked at a bug that was recently discovered where Scribble docs using packages depending on Qi were failing to build in some cases. It turned out to be related to the interaction between a sandboxed evaluator used in docs and the use of the macro debugger API in the Qi compiler. We also continued the discussion on the syntax "blame game" and made progress on what is shaping up to be a reliable and robust solution. We also discussed some challenges in generating appropriate errors for stream consumers in racket/list
for the purposes of deforestation.
Some time ago, we noticed that docs for some packages depending on Qi were failing to build on the Racket Package Index.
We've been talking about ways to identify culpable syntax for a few weeks now and hadn't found any great answers, and had resigned ourselves to simply cultivating a greater appreciation for our existing "hacky" yet usable de-expander.
We've started looking into deforestation of more racket/list
APIs. Consumers like list-ref
are proving to be tricky as far as generating good error messages.
Some time ago, we noticed that Scribble docs for some packages depending on Qi were failing to build on the package index, with an error resembling:
dynamic-require: name is protected
name: 'syntax-local-expand-observer
module: #<resolved-module-path:'#%expobs>
context...:
body of "/Applications/Racket-Latest/share/pkgs/macro-debugger/macro-debugger/emit.rkt"
/Applications/Racket-Latest/share/pkgs/sandbox-lib/racket/sandbox.rkt:754:18
This certainly looked like it could have something to do with Qi's use of the Macro Debugger API. We use it to report compiler tranformations as stages of "expansion" so that we can have some visibility into them (e.g. in the Macro Stepper) for debugging purposes, but we were not able to reproduce it locally at the time, so our tacit assumption was that the cause must be in some other dependency of these packages, or perhaps in these packages themselves. But in attempting to work around these downstream docs issues recently, Sid was able to reproduce the issue as having its source in Qi after all, as there was an old use of the macro debugger library that had presumably not been disabled last time we tried to reproduce the error.
Investigating, we discovered that simply requiring the macro-debugger
module was causing the error to happen (in downstream packages), even if we didn't actually use any APIs defined there! We traced it to this expression in macro-debugger/emit.rkt
:
(define syntax-local-expand-observer
(dynamic-require ''#%expobs 'syntax-local-expand-observer))
All of these packages that were failing had one thing in common -- they all use a sandboxed evaluator to evaluate examples, and this evaluator require
d libraries that ultimately depended on Qi.
It turns out that some time ago, Michael and others had discovered a number of security vulnerabilities in the sandboxed evaluator that would allow sandboxed code to escape the constraints that were intended to be placed on it. For instance, the use of unsafe-car
is disallowed as it could crash the machine, but there could be ways to use trusted code, such as Typed Racket, to find a back door into using unsafe code. Due to the seemingly unending series of issues that were reported at the time, Matthew decided that the safest way to prevent such issues was to disallow any macro reflection within the sandbox. This includes things like local-expand
and syntax-local-value
.
Another such module providing macro reflection is expobs
, which is a module in the Racket expander that records expansion steps. Dynamically requiring this module in the emit.rkt
module thus causes the sandbox to raise an error.
There are a few ways to address this.
First, the upstream emit.rkt
module could perhaps avoid the dynamic require. If it were a static require
, the error would not happen [TODO: why not?]. In principle, the use of the emit event API should be considered safe by the sandbox since we give it syntax and it isn't accessing syntax.
A workaround that we could do on the Qi side is to wrap the dynamic require with yet another dynamic require and catch the exception resulting from the original use, suppressing it.
Dominik created a PR to do this and it was merged soon after the meeting.
In recent discussions, we've talked about ways to track source syntax entered by the user so that we can implicate it in runtime error messages when appropriate (otherwise, the Racket evaluator would implicate the compiled target expression rather than the source expression written by the user). We seemed to converge on the idea of a "language boundary" that would either need to be defined explicitly by the language by means of syntax properties or implicitly by careful use of syntax/loc
.
First, why isn't syntax/loc
always used instead of syntax
? We noted that syntax-rules
already does implicitly use syntax/loc
, whereas syntax-case
and syntax-parse
do not. We figured this must be because in the former case there is always an explicit reference to a template and it is always used in the context of macroexpansion, whereas a utility like syntax-parse
could be used in a broader range of settings where this default may make less sense. But also, it wouldn't make sense for user macros (rather than built-in macros) to implicitly use syntax/loc
as that would result in the reverse problem where we only have the initial syntax, as opposed to our current problem of having only the final syntax.
Returning to the idea of a language boundary as the point at which we transition from using one to using the other, we wondered whether this boundary is a well-defined and perhaps missing abstraction in Racket error reporting today. It seems to be the case that languages should generally always employ syntax/loc
rather than syntax
in their implementation, thereby preserving a reference to the last syntax encountered during expansion that was entered by the user rather than expanded by a built-in macro of the language. In that case, should there perhaps be some APIs in place to make defining and working with this boundary easier?
We felt that we should continue to develop the current work in Syntax Spec and Qi into a proper solution, and then see whether it would be useful to generalize it further.
The first part of the solution involves writing macro-defining macros for internal use in the implementation of the language. That is, we already provide define-qi-syntax-rule
and define-qi-syntax-parser
for users to define Qi macros, and we use these same interfaces to define built-in Qi macros like switch
. These employ syntax
in their templates and so each expansion loses the source location information of the preceding expression. We need analogous forms that will use syntax/loc
instead of syntax
to preserve the preceding source location instead, and we could then use these to define any built-in Qi macros.
With Michael directing, we came up with:
(define-syntax define-core-qi-syntax-rule
(syntax-parser
[(_ (name . pat) template)
#'(define-qi-syntax name
(qi-macro
(syntax-parser
[(_ . pat) (syntax/loc this-syntax
template)])))]))
Unfortunately, it's not straightforward to do this for the "parser" form, since we don't have an explicit reference to the template in that case that we could wrap with syntax/loc
, as we do here. To address this, we employ a kind of "decorator" (we "eta expand the macro transformer procedure") to wrap the parser with code that propagates the source location. Here's what we came up with:
(define-syntax define-core-qi-syntax-parser
(syntax-parser
[(_ name clause ...)
#'(define-qi-syntax name
(qi-macro
(propagate-syntax-loc
(syntax-parser
clause ...))))]))
… that is, the propagate-syntax-loc
is added here.
By adding this "decorator," we gain access to the syntax object whose source location we wish to propagate to the expansion:
(define (propagate-syntax-loc f)
(λ (stx)
(let ([res (f stx)])
(datum->syntax res ; lexical context
;; datum
(syntax-e res)
;; for srcloc
stx
;; for properties
res)))))
But this doesn't correctly handle the case where an expansion is contained in the source expression, e.g. (and g) → g
. Here, it would propagate the source location of (and g)
, but it would be better to use the expansion in this case, g
, since it can be safely inferred as the more specific source of any resulting errors. For example, in (and (or (and (or (and (or an-expression))))))
, all of the wrapping boolean forms simply disappear in the final expansion leaving just an-expression
. If this ends up producing a syntax error, it would be most useful to implicate an-expression
instead of the entire expression. On the other hand, an expansion like (mac a b) → (mac2 a)
would not qualify since the parse isn't itself contained in the source.
We handle this with the following modification:
(define (source-location-contained? inner outer)
(and (equal? (syntax-source inner)
(syntax-source outer))
(>= (syntax-position inner)
(syntax-position outer))
(<= (+ (syntax-position inner)
(syntax-span inner))
(+ (syntax-position outer)
(syntax-span outer)))))
;; Example: (and g) → g
;; This would naively highlight (and g), but in this case
;; we want to highlight g instead. So, we check whether
;; one expression is contained in the other, and if so,
;; keep the srcloc of the inner one, to handle this.
(define (propagate-syntax-loc f)
(λ (stx)
(let ([res (f stx)])
(datum->syntax res ; lexical context
;; datum
(syntax-e res)
;; for srcloc
(if (source-location-contained? res stx)
res
stx)
;; for properties
res)))))
We then redefined all built-in Qi macros to use these macros, rebuilt the code and ran tests, and everything was still working!
There's an old video online featuring Linus Torvalds introducing Git, and he says something like, "No one writes code that works the first time ... except me, and there's only one me." Well, I think we proved there's more than one Linus in town!
Now, it only remains to use this together with the APIs for querying surface syntax that we started to add to Syntax Spec recently, to actually get a reference to the user-entered surface syntax in generating errors in the compiler (as it stands, we only have the location of the syntax rather than the syntax itself). This should then robustly serve the need that is currently fulfilled by our ever-popular MVP ✨ de-expander ✨ (it was simultaneously the Minimum Viable Product and voted the Most Valuable Player of the Qi 4 release), and could be a pattern that other languages using Syntax Spec could follow.
One of the main places where we need a handle to the source syntax is in generating error messages in deforested functional sequences on lists (i.e. map
, filter
, foldl
, etc.). But in deforesting more instances of stream consumers, we've run into some issues.
Specifically, list-ref
is functionally identical to interfaces like cadr
, caadr
, and so on, as well as to first
, second
, etc., and so, Dominik is writing a common stream consumer to implement these. But it has proven tricky to use the low level contract
API to implicate source syntax here, and it may be necessary to construct the more primitive blame
objects. But so far, there have been no other distinct examples of consumers that could help to clarify what the general approach should be here.
In trying to identify other examples, we noticed that take
is an example of a transformer that (unlike other transformers) can produce a runtime error, and in this respect is similar to consumers, so it could prove to be useful to consider in this connection.
Some others could be: drop
, split-at
, take-right
, drop-right
, split-at-right
, list-tail?
, list-set
, list-update
.
Currently in our deforestation tests, we employ crude string matching to check whether an expression is deforested or not. This isn't reliable or convenient enough for development and we agreed that we need better ways to test whether expressions are being deforested. The logging approach employed in Typed Racket could be an option to look into.
(Some of these are carried over from last time)
- Report the sandbox issue on the macro-debugger repository.
- Improve unit testing infrastructure for deforestation.
- Schedule a discussion for Qi's theory of effects and merge the corresponding PR.
- Schedule a discussion for the language composition proposal and implement a proof of concept.
- Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
- Deforest other
racket/list
APIs viaqi/list
- Decide on whether there will be any deforestation in the Qi core, upon
(require qi)
(without(require qi/list)
) - Review and merge the fixes for premature termination of compiler passes.
- Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.
Dominik, Michael, Sid
Home | Developer's Guide | Calendar | Events | Projects | Meeting Notes