Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonterminal blues #143

Conversation

countvajhula
Copy link
Collaborator

@countvajhula countvajhula commented Dec 28, 2023

Summary of Changes

This began as an investigation of the long-functional-pipeline benchmark, which it turned out wasn't getting deforested. That seems to have something to do with the fact that that benchmark is rewritten by both of our current optimization passes, normalization and deforestation, and most likely, the latter doesn't see the nonterminal syntax property for reasons that are somewhat mysterious because we weren't explicitly propagating it, and the syntax property being "nonpreserved" meant that our tests were not correctly handling this property across phases (see comments below).

In the course of investigating I ended up refactoring a lot of tests, and they actually revealed a lot of other smaller issues that are mostly fixed. But the main nonterminal property issue remains a mystery. Summary below:

  • The tests pass when using racket, but fail when using racket -y. In the latter case, the nonterminal property is not present after expansion.
  • We have a failing test in compiler/rules.rkt that passes if we manually introduce the nonterminal property between passes viatag-form-syntax
  • Yet, adding the property to the compiler.rkt module itself, at the conclusion of normalize-pass, does not make a difference to the test result. Adding the property at the find-and-map/qi level (after excluding non-syntax and expressions tagged #%host-expression -- which it didn't like) doesn't make a difference to the test result either.

Other changes:

  • Use on-demand expansion pervasively in the tests instead of hand-writing core expressions. This is much better and revealed some bugs in normalization
  • Added some macros that make the tests much easier to read and write (and this revealed that many tests were actually identical (copypasta?) even though their descriptions explained what they were supposed to test - fixed)
  • Added a failing test to reveal the new multi-pass bug with propagating the nonterminal property
  • Rename a util module to (compiler) pass. There are too many modules named "util" and obviously we should avoid such a generic name.

Public Domain Dedication

  • In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

Put these in a dedicated module where they're usable both in the
compiler itself as well as in tests.
This was formerly a utility in a single test module, but it's likely
to be broadly useful in testing.
This adds a `test-compile` macro that accepts surface syntax, expands
it, and applies the indicated optimization passes.
This uses on-demand invocation of the expander to generally avoid
writing core language expressions by hand. There were also many tests
that had the right descriptions but which were all identical. This
fixes them to reflect the actual cases they meant to test.
This avoids the need to hand-write core language expressions, and
avoids the need to manually attach the `nonterminal` syntax
property. This also turns out to be more reliable since the
hand-written input expressions may not actually be encountered in
practice, whereas the ones produced by the expander from surface
syntax are guaranteed to be.
This is necessary to get the tests to pass when executed using `racket
-y`. They already pass when using just `racket`.
... when it was formerly mysteriously not working. I think that's one
less mystery to worry about, but it might be two mysteries.
@michaelballantyne
Copy link
Collaborator

The problem occurs because of an interaction between the way that the tests are written and the fact that we made the nonterminal syntax property non-preserved.

When applying a syntax property with syntax-property, there's an optional third argument that indicates whether the property should be preserved in serialized syntax, i.e. when a compiled file is written out by raco make or racket -y.

If a syntax property is really only needed temporarily for compilation, making it non-preserved can avoid increasing the size of compiled files. So as of right now we made the nonterminal property non-preserved.

In the tests as written, phase0-expand-flow expands the initial syntax at expansion time of the test module, but the remainder of the test (eg. calling deforest-pass and deforested?) happens at run-time. That is, this test:

(deforested? (syntax->datum
              (deforest-pass
                (phase0-expand-flow
                 #'(>< (~>> (filter odd?) (map sqr)))))))

Expands to something like:

(deforested? (syntax->datum
              (deforest-pass
                #'(amp
                   (thread
                    (#%blanket-template
                     ((#%host-expression filter)
                      (#%host-expression odd?)
                      __))
                    (#%blanket-template
                     ((#%host-expression map)
                      (#%host-expression sqr)
                      __)))))))

and then that runs at runtime. The problem is that because the nonterminal property is not preserved, when running with racket -y the property is lost after expansion time and is thus missing when running deforest-pass at runtime.

I suggest instead running the entire compiler pipeline at expansion time, and only returning a boolean to runtime. This makes sure that the compiler runs in the same way it will in a real program---at phase 1, and without any step of serialization between expansion and compilation. Here's a test that works file with both racket and racket -y:

#lang racket/base

(require
  (for-syntax racket/base racket/string)
  qi/flow/core/compiler
  qi/flow/core/deforest
  ;; necessary to recognize and expand core forms correctly
  qi/flow/extended/expander
  ;; necessary to correctly expand the right-threading form
  qi/flow/extended/forms      
  (submod qi/flow/extended/expander invoke)
  syntax/macro-testing
  rackunit)

(begin-for-syntax
  (define (deforested? exp)
    (string-contains? (format "~a" exp) "cstream")))

(check-true
 (phase1-eval
  (deforested? (syntax->datum
                (deforest-pass
                  (expand-flow
                   #'(>< (~>> (filter odd?) (map sqr)))))))))

@countvajhula
Copy link
Collaborator Author

@michaelballantyne Ok, thanks for the explanation! That definitely helps to understand what's going on. I'll have to think about how to refactor the tests to do the checks in phase 1, and I think if I can get this test to pass, that should help reveal how to address the original problem with the long-functional-pipeline benchmark.

This is a provisional fix for the multi-pass issue revealed by the
`long-functional-pipeline` benchmark, where a nontrivial normalization
was resulting in syntax that no longer had the `nonterminal` property,
preventing deforestation from being applied.
@countvajhula
Copy link
Collaborator Author

I've fixed that specific test to run in phase 1, verified that it is capable of detecting the failure to propagate nonterminal across normalization → deforestation, and then added the fix, which was to simply attach the property to the top level flow expression that is being compiled.

I'm assuming that:

  1. Since it is a toplevel expression, it is safe to just reattach the nonterminal property after normalization, without first checking whether the expansion had that property attached to begin with.
  2. Since find-and-map constructs transformed syntax using datum->syntax, passing the input syntax for the srcloc and prop arguments (thanks @benknoble for reminding me of this!), that every component syntax object that is modified already propagates the nonterminal property, so that it's only the toplevel expression that a priori doesn't propagate the property (which it now does).

Does that sound right?

One general impression is that although the addition of the syntax property appeared a simple solution at first, it does make things a bit harder to reason about. We should revisit this at some point (after the release 😄)

Now I need to push the remaining tests into phase 1, and I think the PR should be ready at that point.

One "Interesting" thing @dzoep : long-functional-pipeline now does get deforested (yay!), but it's only about 1.5x faster than Racket, when I think we are expecting about length-of-the-pipeline times faster, in this case 6-7, depending on whether values is normalized away by Racket as we do it).

@benknoble
Copy link
Collaborator

  1. Since find-and-map constructs transformed syntax using datum->syntax, passing the input syntax for the srcloc and prop arguments (thanks @benknoble for reminding me of this!), that every component syntax object that is modified already propagates the nonterminal property, so that it's only the toplevel expression that a priori doesn't propagate the property (which it now does).

Isn't find-and-map pretty crucial to the compiler or expander? I'm actually very shocked it needs to use datum->syntax this way, then, and think that suggests a possible [architectural? design? implementation?] flaw (because I would expect Racket macros and other compile-time code to not need that kind of low-level tool in most cases).

I'll have to really study the code [again] to figure out why it's needed though, so I'm mostly saying this now as a way of "please publicly hold me accountable to investigate after the release."

@benknoble
Copy link
Collaborator

I'm also not seeing any changes to find-and-map in this PR; was that in a previous PR or am I missing something?

@countvajhula
Copy link
Collaborator Author

Well, I think find-and-map was a simple way to do a syntax tree traversal that happened to suffice for our needs (until we hit this particular issue!). My understanding is that the "right way" would be to do a proper core grammar aware tree-traversal, using something like syntax-parse (which I believe this TODO notes, and which I think you're getting at), but if we did that, we would essentially be duplicating the entire core language grammar in the compiler, which we've already notated in the expander. I think we're hoping that Syntax Spec would be able to infer such traversal utilities that could be used down the line (correct me if I'm wrong Michael) from the core grammar specification, to avoid this kind of duplication. And in that case, it would be unnecessary effort to implement such a Qi-specific traversal now since we would get it "for free" later anyway in generic form.

And yes, I should have clarified that this PR doesn't modify find-and-map, just normalize-pass which uses it.

I'll gladly hold you accountable to implement the traversal if Syntax Spec doesn't get around to it quickly enough 😝

@michaelballantyne
Copy link
Collaborator

I think a generic syntax traversal using datum->syntax is a totally reasonable thing to do here, but it should probably ultimately live in the syntax-spec library rather than in the Qi repo.

@benknoble
Copy link
Collaborator

In case I happen to have time to pole around: does syntax-spec expose the grammar anywhere in a way that I can compute over it? Or where in its internals should I look?

The `nonterminal` syntax property attached by Syntax Spec is
"non-preserved," so the property would not be present at phase 0 if
the code is compiled. We ensure that the compiler rules being tested
are applied in phase 1.
@michaelballantyne
Copy link
Collaborator

In case I happen to have time to pole around: does syntax-spec expose the grammar anywhere in a way that I can compute over it? Or where in its internals should I look?

It does not expose it, and currently the only thing it generates from the grammar is the macro expander so it also does not have a super clean internal representation of the grammar either. The syntax classes for matching the grammar declarations are here:

https://github.com/michaelballantyne/syntax-spec/blob/15d8dd1c4999c43547671a3ae877b2fe7a74a9d9/private/syntax/syntax-classes.rkt#L103

And the generation of the expander from the grammar is here:

https://github.com/michaelballantyne/syntax-spec/blob/15d8dd1c4999c43547671a3ae877b2fe7a74a9d9/private/syntax/compile/nonterminal-expander.rkt#L25

Retain version 8.9 in the test matrix for both CS and BC, and 8.5
specifically for CS.
@countvajhula
Copy link
Collaborator Author

Yesterday we noticed that CI was failing, specifically on BC versions 8.5 through 8.8. We generally agreed that supporting BC is not a priority, so I've just updated the test matrix to reflect the actual current compatibility -- that is, BC/CS 8.9+, and CS all the way back to 8.5 (i.e. unchanged from before).

These are the specific errors, for reference, in case anyone thinks they might be cause for concern:

BC 8.5-8.7:

	write: cannot marshal value that is embedded in compiled code
	  value: (srcloc #<path:/home/runner/work/qi/qi/qi-lib/flow/core/normalize.rkt> 74 7 2297 22)
	  compilation context...:
	   /home/runner/work/qi/qi/qi-test/tests/qi.rkt
	  context...:
	   /usr/share/racket/collects/compiler/private/cm-minimal.rkt:808:8

BC 8.8:

	mask-accessor: contract violation
	raco setup: 1 making: <pkgs>/rackunit-abbrevs/scribblings
	  expected: mask?
	  given: #f
	  compilation context...:
	   /home/runner/.local/share/racket/8.8-bc/pkgs/rackunit-abbrevs/private/test-typed-rackunit-abbrevs.rkt
	  context...:
	   /usr/share/racket/pkgs/typed-racket-lib/typed-racket/types/overlap.rkt:48:0: overlap?

I think this PR is ready to go! Any review appreciated. I'll aim to merge it soon if there are no further comments.

@michaelballantyne
Copy link
Collaborator

michaelballantyne commented Dec 30, 2023 via email

@countvajhula
Copy link
Collaborator Author

Yeah, if it's an easy fix we might as well do it. From googling the error, it sounds like it might be a case of "3d syntax" which I gather is best avoided as it loses the separate compilation guarantee, which I've come to really appreciate as the codebase gets larger!

@countvajhula
Copy link
Collaborator Author

I haven't looked into the srcloc issue yet, but so as not to hold up the other fixes in this PR, I've just modified the test workflow to run tests on BC 8.5 but not block other tests from running if it fails, so we see that the issue remains but it doesn't break CI. We can fix the remaining issue in a separate PR, so I'll merge this now!

@countvajhula countvajhula merged commit cad6c0e into drym-org:lets-write-a-qi-compiler Jan 3, 2024
5 of 7 checks passed
@countvajhula countvajhula mentioned this pull request Jan 3, 2024
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants