Qi Meeting Mar 29 2024

Breaking Out of the Sandbox

Qi Meeting Mar 29 2024

Adjacent meetings: Previous | Up | Next

Summary

We looked at a bug that was recently discovered where Scribble docs using packages depending on Qi were failing to build in some cases. It turned out to be related to the interaction between a sandboxed evaluator used in docs and the use of the macro debugger API in the Qi compiler. We also continued the discussion on the syntax "blame game" and made progress on what is shaping up to be a reliable and robust solution. We also discussed some challenges in generating appropriate errors for stream consumers in racket/list for the purposes of deforestation.

Background

Some time ago, we noticed that docs for some packages depending on Qi were failing to build on the Racket Package Index.

We've been talking about ways to identify culpable syntax for a few weeks now and hadn't found any great answers, and had resigned ourselves to simply cultivating a greater appreciation for our existing "hacky" yet usable de-expander.

We've started looking into deforestation of more racket/list APIs. Consumers like list-ref are proving to be tricky as far as generating good error messages.

Failures in Building Scribble Docs

Some time ago, we noticed that Scribble docs for some packages depending on Qi were failing to build on the package index, with an error resembling:

dynamic-require: name is protected
  name: 'syntax-local-expand-observer
  module: #<resolved-module-path:'#%expobs>
  context...:
   body of "/Applications/Racket-Latest/share/pkgs/macro-debugger/macro-debugger/emit.rkt"
   /Applications/Racket-Latest/share/pkgs/sandbox-lib/racket/sandbox.rkt:754:18

This certainly looked like it could have something to do with Qi's use of the Macro Debugger API. We use it to report compiler tranformations as stages of "expansion" so that we can have some visibility into them (e.g. in the Macro Stepper) for debugging purposes, but we were not able to reproduce it locally at the time, so our tacit assumption was that the cause must be in some other dependency of these packages, or perhaps in these packages themselves. But in attempting to work around these downstream docs issues recently, Sid was able to reproduce the issue as having its source in Qi after all, as there was an old use of the macro debugger library that had presumably not been disabled last time we tried to reproduce the error.

Investigating, we discovered that simply requiring the macro-debugger module was causing the error to happen (in downstream packages), even if we didn't actually use any APIs defined there! We traced it to this expression in macro-debugger/emit.rkt:

(define syntax-local-expand-observer
  (dynamic-require ''#%expobs 'syntax-local-expand-observer))

No Playing Rough in the Sandbox!

All of these packages that were failing had one thing in common -- they all use a sandboxed evaluator to evaluate examples, and this evaluator required libraries that ultimately depended on Qi.

It turns out that some time ago, Michael and others had discovered a number of security vulnerabilities in the sandboxed evaluator that would allow sandboxed code to escape the constraints that were intended to be placed on it. For instance, the use of unsafe-car is disallowed as it could crash the machine, but there could be ways to use trusted code, such as Typed Racket, to find a back door into using unsafe code. Due to the seemingly unending series of issues that were reported at the time, Matthew decided that the safest way to prevent such issues was to disallow any macro reflection within the sandbox. This includes things like local-expand and syntax-local-value.

Another such module providing macro reflection is expobs, which is a module in the Racket expander that records expansion steps. Dynamically requiring this module in the emit.rkt module thus causes the sandbox to raise an error.

Addressing the Issue

There are a few ways to address this.

Avoid the dynamic-require

First, the upstream emit.rkt module could perhaps avoid the dynamic require. If it were a static require, the error would not happen [TODO: why not?]. In principle, the use of the emit event API should be considered safe by the sandbox since we give it syntax and it isn't accessing syntax.

Add another dynamic-require!

A workaround that we could do on the Qi side is to wrap the dynamic require with yet another dynamic require and catch the exception resulting from the original use, suppressing it.

Dominik created a PR to do this and it was merged soon after the meeting.

Tracking Source Syntax

In recent discussions, we've talked about ways to track source syntax entered by the user so that we can implicate it in runtime error messages when appropriate (otherwise, the Racket evaluator would implicate the compiled target expression rather than the source expression written by the user). We seemed to converge on the idea of a "language boundary" that would either need to be defined explicitly by the language by means of syntax properties or implicitly by careful use of syntax/loc.

First, why isn't syntax/loc always used instead of syntax? We noted that syntax-rules already does implicitly use syntax/loc, whereas syntax-case and syntax-parse do not. We figured this must be because in the former case there is always an explicit reference to a template and it is always used in the context of macroexpansion, whereas a utility like syntax-parse could be used in a broader range of settings where this default may make less sense. But also, it wouldn't make sense for user macros (rather than built-in macros) to implicitly use syntax/loc as that would result in the reverse problem where we only have the initial syntax, as opposed to our current problem of having only the final syntax.

Returning to the idea of a language boundary as the point at which we transition from using one to using the other, we wondered whether this boundary is a well-defined and perhaps missing abstraction in Racket error reporting today. It seems to be the case that languages should generally always employ syntax/loc rather than syntax in their implementation, thereby preserving a reference to the last syntax encountered during expansion that was entered by the user rather than expanded by a built-in macro of the language. In that case, should there perhaps be some APIs in place to make defining and working with this boundary easier?

We felt that we should continue to develop the current work in Syntax Spec and Qi into a proper solution, and then see whether it would be useful to generalize it further.

Macros to simplify use of syntax/loc

The first part of the solution involves writing macro-defining macros for internal use in the implementation of the language. That is, we already provide define-qi-syntax-rule and define-qi-syntax-parser for users to define Qi macros, and we use these same interfaces to define built-in Qi macros like switch. These employ syntax in their templates and so each expansion loses the source location information of the preceding expression. We need analogous forms that will use syntax/loc instead of syntax to preserve the preceding source location instead, and we could then use these to define any built-in Qi macros.

Channeling Linus Torvalds

With Michael directing, we came up with:

(define-syntax define-core-qi-syntax-rule
  (syntax-parser
    [(_ (name . pat) template)
     #'(define-qi-syntax name
         (qi-macro
          (syntax-parser
            [(_ . pat) (syntax/loc this-syntax
                         template)])))]))

Unfortunately, it's not straightforward to do this for the "parser" form, since we don't have an explicit reference to the template in that case that we could wrap with syntax/loc, as we do here. To address this, we employ a kind of "decorator" (we "eta expand the macro transformer procedure") to wrap the parser with code that propagates the source location. Here's what we came up with:

(define-syntax define-core-qi-syntax-parser
  (syntax-parser
    [(_ name clause ...)
     #'(define-qi-syntax name
         (qi-macro
          (propagate-syntax-loc
           (syntax-parser
             clause ...))))]))

… that is, the propagate-syntax-loc is added here.

By adding this "decorator," we gain access to the syntax object whose source location we wish to propagate to the expansion:

(define (propagate-syntax-loc f)
  (λ (stx)
    (let ([res (f stx)])
      (datum->syntax res  ; lexical context
        ;; datum
        (syntax-e res)
        ;; for srcloc
        stx
        ;; for properties
        res)))))

But this doesn't correctly handle the case where an expansion is contained in the source expression, e.g. (and g) → g. Here, it would propagate the source location of (and g), but it would be better to use the expansion in this case, g, since it can be safely inferred as the more specific source of any resulting errors. For example, in (and (or (and (or (and (or an-expression)))))), all of the wrapping boolean forms simply disappear in the final expansion leaving just an-expression. If this ends up producing a syntax error, it would be most useful to implicate an-expression instead of the entire expression. On the other hand, an expansion like (mac a b) → (mac2 a) would not qualify since the parse isn't itself contained in the source.

We handle this with the following modification:

(define (source-location-contained? inner outer)
  (and (equal? (syntax-source inner)
               (syntax-source outer))
       (>= (syntax-position inner)
           (syntax-position outer))
       (<= (+ (syntax-position inner)
              (syntax-span inner))
           (+ (syntax-position outer)
              (syntax-span outer)))))

;; Example: (and g) → g
;; This would naively highlight (and g), but in this case
;; we want to highlight g instead. So, we check whether
;; one expression is contained in the other, and if so,
;; keep the srcloc of the inner one, to handle this.
(define (propagate-syntax-loc f)
  (λ (stx)
    (let ([res (f stx)])
      (datum->syntax res  ; lexical context
        ;; datum
        (syntax-e res)
        ;; for srcloc
        (if (source-location-contained? res stx)
            res
            stx)
        ;; for properties
        res)))))

We then redefined all built-in Qi macros to use these macros, rebuilt the code and ran tests, and everything was still working!

There's an old video online featuring Linus Torvalds introducing Git, and he says something like, "No one writes code that works the first time ... except me, and there's only one me." Well, I think we proved there's more than one Linus in town!

Putting It Together

Now, it only remains to use this together with the APIs for querying surface syntax that we started to add to Syntax Spec recently, to actually get a reference to the user-entered surface syntax in generating errors in the compiler (as it stands, we only have the location of the syntax rather than the syntax itself). This should then robustly serve the need that is currently fulfilled by our ever-popular MVP ✨ de-expander ✨ (it was simultaneously the Minimum Viable Product and voted the Most Valuable Player of the Qi 4 release), and could be a pattern that other languages using Syntax Spec could follow.

Challenges in Consumer Error Messages

One of the main places where we need a handle to the source syntax is in generating error messages in deforested functional sequences on lists (i.e. map, filter, foldl, etc.). But in deforesting more instances of stream consumers, we've run into some issues.

Specifically, list-ref is functionally identical to interfaces like cadr, caadr, and so on, as well as to first, second, etc., and so, Dominik is writing a common stream consumer to implement these. But it has proven tricky to use the low level contract API to implicate source syntax here, and it may be necessary to construct the more primitive blame objects. But so far, there have been no other distinct examples of consumers that could help to clarify what the general approach should be here.

In trying to identify other examples, we noticed that take is an example of a transformer that (unlike other transformers) can produce a runtime error, and in this respect is similar to consumers, so it could prove to be useful to consider in this connection.

Some others could be: drop, split-at, take-right, drop-right, split-at-right, list-tail?, list-set, list-update.

Better Tests Needed for Deforestation

Currently in our deforestation tests, we employ crude string matching to check whether an expression is deforested or not. This isn't reliable or convenient enough for development and we agreed that we need better ways to test whether expressions are being deforested. The logging approach employed in Typed Racket could be an option to look into.

Next Steps

(Some of these are carried over from last time)

Report the sandbox issue on the macro-debugger repository.
Improve unit testing infrastructure for deforestation.
Schedule a discussion for Qi's theory of effects and merge the corresponding PR.
Schedule a discussion for the language composition proposal and implement a proof of concept.
Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
Deforest other racket/list APIs via qi/list
Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
Review and merge the fixes for premature termination of compiler passes.
Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly